需要高质量的面部图像来保证在监视和安全场景中自动识别系统(FR)系统的稳定性和可靠性。但是,由于传输或存储的限制,在分析之前,通常会压缩大量的面部数据。压缩图像可能会失去强大的身份信息,从而导致FR系统的性能降低。在此,我们首次尝试研究FR系统的明显差异(JND),可以将其定义为FR系统无法注意到的最大失真。更具体地说,我们建立了一个JND数据集,其中包括3530个原始图像和137,670个由高级参考编码/解码软件生成的压缩图像,该图像基于多功能视频编码(VVC)标准(VTM-15.0)。随后,我们开发了一种新型的JND预测模型,以直接推断FR系统的JND图像。特别是,为了最大程度地删除冗余性,在不损害鲁棒身份信息的情况下,我们将编码器应用于多个功能提取和基于注意力的特征分解模块,以将面部特征逐渐分解为两个不相关的组件,即身份和残差特征,通过自我 - 监督学习。然后,剩余特征被馈入解码器以生成残差图。最后,通过从原始图像中减去残差图来获得预测的JND映射。实验结果表明,与最先进的JND模型相比,所提出的模型可以实现JND MAP预测的更高准确性,并且能够在维持FR系统的性能的同时保存更多的位置,而与VTM-15.0相比。
translated by 谷歌翻译
在本文中,提出了一种基于高动态范围(HDR)图像的频率差异的新颖有效的图像质量评估(IQA)算法,称为基于局部全球频率特征模型(LGFM)。由假设人类视觉系统高度适应于在感知视觉场景时提取结构信息和部分频率的动机,Gabor和Butterworth滤镜分别用于HDR图像的亮度,分别提取本地和全局频率特征。相似性测量和特征池在频率特征上依次执行,以获得预测的质量评分。在四个广泛使用的基准上评估的实验表明,与最先进的HDR IQA方法相比,所提出的LGFM可以提供更高的主观感知一致性。我们的代码可在:\ url {https://github.com/eezkni/lgfm}中获得。
translated by 谷歌翻译
摆脱拟合配对训练数据的基本限制,最近无监督的低光增强方法在调整图像的照明和对比度方面表现出色。但是,对于无监督的低光增强,由于缺乏对详细信号的监督而导致的剩余噪声抑制问题在很大程度上阻碍了这些方法在现实世界应用中的广泛部署。在本文中,我们提出了一种新型的自行车相互作用生成对抗网络(CIGAN),以实现无监督的低光图像增强,它不仅能够更好地在低/正常光图像之间更好地传输照明分布,还可以操纵两个域之间的详细信号,例如。 ,在环状增强/降解过程中抑制/合成逼真的噪声。特别是,提出的低光引导转换馈送馈送从增强gan(Egan)发电机的低光图像的特征到降解GAN(DGAN)的发生器。借助真正的弱光图像的信息,DGAN可以在低光图像中综合更逼真的不同照明和对比度。此外,DGAN中的特征随机扰动模块学会了增加特征随机性以产生各种特征分布,从而说服了合成的低光图像以包含逼真的噪声。广泛的实验既证明了所提出的方法的优越性,又证明了每个模块在CIGAN中的有效性。
translated by 谷歌翻译
我们提出了Exe-Gan,这是一种新型的使用生成对抗网络的典范引导的面部介绍框架。我们的方法不仅可以保留输入面部图像的质量,而且还可以使用类似示例性的面部属性来完成图像。我们通过同时利用输入图像的全局样式,从随机潜在代码生成的随机样式以及示例图像的示例样式来实现这一目标。我们介绍了一个新颖的属性相似性指标,以鼓励网络以一种自我监督的方式从示例中学习面部属性的风格。为了确保跨地区边界之间的自然过渡,我们引入了一种新型的空间变体梯度反向传播技术,以根据空间位置调整损耗梯度。关于公共Celeba-HQ和FFHQ数据集的广泛评估和实际应用,可以验证Exe-GAN的优越性,从面部镶嵌的视觉质量来看。
translated by 谷歌翻译
卷积神经网络(CNNS)成功地进行了压缩图像感测。然而,由于局部性和重量共享的归纳偏差,卷积操作证明了建模远程依赖性的内在限制。变压器,最初作为序列到序列模型设计,在捕获由于基于自我关注的架构而捕获的全局背景中,即使它可以配备有限的本地化能力。本文提出了一种混合框架,一个混合框架,其集成了从CNN提供的借用的优点以及变压器提供的全局上下文,以获得增强的表示学习。所提出的方法是由自适应采样和恢复组成的端到端压缩图像感测方法。在采样模块中,通过学习的采样矩阵测量图像逐块。在重建阶段,将测量投射到双杆中。一个是用于通过卷积建模邻域关系的CNN杆,另一个是用于采用全球自我关注机制的变压器杆。双分支结构是并发,并且本地特征和全局表示在不同的分辨率下融合,以最大化功能的互补性。此外,我们探索一个渐进的战略和基于窗口的变压器块,以降低参数和计算复杂性。实验结果表明了基于专用变压器的架构进行压缩感测的有效性,与不同数据集的最先进方法相比,实现了卓越的性能。
translated by 谷歌翻译
In this paper, we propose a robust 3D detector, named Cross Modal Transformer (CMT), for end-to-end 3D multi-modal detection. Without explicit view transformation, CMT takes the image and point clouds tokens as inputs and directly outputs accurate 3D bounding boxes. The spatial alignment of multi-modal tokens is performed implicitly, by encoding the 3D points into multi-modal features. The core design of CMT is quite simple while its performance is impressive. CMT obtains 73.0% NDS on nuScenes benchmark. Moreover, CMT has a strong robustness even if the LiDAR is missing. Code will be released at https://github.com/junjie18/CMT.
translated by 谷歌翻译
Knowledge graphs (KG) have served as the key component of various natural language processing applications. Commonsense knowledge graphs (CKG) are a special type of KG, where entities and relations are composed of free-form text. However, previous works in KG completion and CKG completion suffer from long-tail relations and newly-added relations which do not have many know triples for training. In light of this, few-shot KG completion (FKGC), which requires the strengths of graph representation learning and few-shot learning, has been proposed to challenge the problem of limited annotated data. In this paper, we comprehensively survey previous attempts on such tasks in the form of a series of methods and applications. Specifically, we first introduce FKGC challenges, commonly used KGs, and CKGs. Then we systematically categorize and summarize existing works in terms of the type of KGs and the methods. Finally, we present applications of FKGC models on prediction tasks in different areas and share our thoughts on future research directions of FKGC.
translated by 谷歌翻译
Few Shot Instance Segmentation (FSIS) requires models to detect and segment novel classes with limited several support examples. In this work, we explore a simple yet unified solution for FSIS as well as its incremental variants, and introduce a new framework named Reference Twice (RefT) to fully explore the relationship between support/query features based on a Transformer-like framework. Our key insights are two folds: Firstly, with the aid of support masks, we can generate dynamic class centers more appropriately to re-weight query features. Secondly, we find that support object queries have already encoded key factors after base training. In this way, the query features can be enhanced twice from two aspects, i.e., feature-level and instance-level. In particular, we firstly design a mask-based dynamic weighting module to enhance support features and then propose to link object queries for better calibration via cross-attention. After the above steps, the novel classes can be improved significantly over our strong baseline. Additionally, our new framework can be easily extended to incremental FSIS with minor modification. When benchmarking results on the COCO dataset for FSIS, gFSIS, and iFSIS settings, our method achieves a competitive performance compared to existing approaches across different shots, e.g., we boost nAP by noticeable +8.2/+9.4 over the current state-of-the-art FSIS method for 10/30-shot. We further demonstrate the superiority of our approach on Few Shot Object Detection. Code and model will be available.
translated by 谷歌翻译
Graph Neural Networks (GNNs) have shown satisfying performance on various graph learning tasks. To achieve better fitting capability, most GNNs are with a large number of parameters, which makes these GNNs computationally expensive. Therefore, it is difficult to deploy them onto edge devices with scarce computational resources, e.g., mobile phones and wearable smart devices. Knowledge Distillation (KD) is a common solution to compress GNNs, where a light-weighted model (i.e., the student model) is encouraged to mimic the behavior of a computationally expensive GNN (i.e., the teacher GNN model). Nevertheless, most existing GNN-based KD methods lack fairness consideration. As a consequence, the student model usually inherits and even exaggerates the bias from the teacher GNN. To handle such a problem, we take initial steps towards fair knowledge distillation for GNNs. Specifically, we first formulate a novel problem of fair knowledge distillation for GNN-based teacher-student frameworks. Then we propose a principled framework named RELIANT to mitigate the bias exhibited by the student model. Notably, the design of RELIANT is decoupled from any specific teacher and student model structures, and thus can be easily adapted to various GNN-based KD frameworks. We perform extensive experiments on multiple real-world datasets, which corroborates that RELIANT achieves less biased GNN knowledge distillation while maintaining high prediction utility.
translated by 谷歌翻译
This paper focuses on designing efficient models with low parameters and FLOPs for dense predictions. Even though CNN-based lightweight methods have achieved stunning results after years of research, trading-off model accuracy and constrained resources still need further improvements. This work rethinks the essential unity of efficient Inverted Residual Block in MobileNetv2 and effective Transformer in ViT, inductively abstracting a general concept of Meta-Mobile Block, and we argue that the specific instantiation is very important to model performance though sharing the same framework. Motivated by this phenomenon, we deduce a simple yet efficient modern \textbf{I}nverted \textbf{R}esidual \textbf{M}obile \textbf{B}lock (iRMB) for mobile applications, which absorbs CNN-like efficiency to model short-distance dependency and Transformer-like dynamic modeling capability to learn long-distance interactions. Furthermore, we design a ResNet-like 4-phase \textbf{E}fficient \textbf{MO}del (EMO) based only on a series of iRMBs for dense applications. Massive experiments on ImageNet-1K, COCO2017, and ADE20K benchmarks demonstrate the superiority of our EMO over state-of-the-art methods, \eg, our EMO-1M/2M/5M achieve 71.5, 75.1, and 78.4 Top-1 that surpass \textbf{SoTA} CNN-/Transformer-based models, while trading-off the model accuracy and efficiency well.
translated by 谷歌翻译